-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: return fully-resolved route to NHT clients #16793
base: master
Are you sure you want to change the base?
Conversation
This is pretty much a hard no from me. Having BGP act as a underlay for itself is something that is allowed and frankly lots of people are using. Can you explain your reasoning on why zebra is sending the wrong metric value up? |
Failure case: DEBUG:r1:cmd_status("/bin/bash -c 'vtysh -c '"'"'show bgp ipv4 unicast 10.0.0.4/32 json'"'"' 2>/dev/null'") |
Don, as you said "Having BGP act as a underlay for itself is something that is allowed and frankly lots of people are using". that's exactly when we run into the issue that the BGP bestpath is wrong. Please take a look at the simple topotest. |
You need to give me more data than a simple show command in order for me to understand the problem. Please show me the routes in zebra as well as in bgp as well as your config. Else I cannot see anything wrong. A metric of 0 in isolation without any other context is next to useless. |
The configs and the topology are all in the topotest I have included in the PR. I will get the zebra output. |
I have attached the relevant output from the topotest: output.fail.txt Please reference the topology and description in test_nht_double_recusion.py. The additional output is generated by applying the following diff to the topotest: |
Can I get some advice about this test failure: What is our recommendation for handling this case?
|
You have to fix the test too (to expect the same output as it's from the CLI). |
Thanks for the advice. Let me remove the extra text from the current PR, and handle it as a separate item. |
I had to rebase in order to update the style issues found by "frrbot". But so many tags have been added. Not sure what I have done wrong. Should I create a new PR? |
The branch got messed up during updates. Close this PR, and opened a new PR: #16832 |
it's not clear to me at this point in time what will happen if we have multiple recursive resolutions across multiple protocols what will happen here. I still need to spend more time with this |
imo this code is broken. If I add this: r1(config)# ip route 192.168.12.2/32 192.168.12.99 r1-eth0 we get this:
and this:
The 192.168.24.4 nht code should have switched over to the static routes. It is not doing this. This will seriously break allot of production code. I also think that by only looking at the first nexthop you will end up in situations where it is ignoring a recursive route resolution that uses multiple paths. Additionally what happens if those multiple paths have different weights? It's not clear to me what weight should be used then. |
658b99a
to
33e16c9
Compare
I'm really not convinced that this is an actual nexthop tracking problem that should be resolved this way. I really do not believe that we should be modifying what is sent up here in this manner. I would recommend implementing one of these two solutions instead: a) When passing up a resolved nexthop for nexthop tracking add a metric value, per nexthop sent. That way the upper level protocol would get the metric of the route entry as well as the metric of the individual nexthops. Then an upper level protocol that cared could make a decision about what it wants to do with that information. b) Modify the upper level protocol to notice that the metric for a particular nexthop is X then when only installing the route into zebra modify the sent down metric. |
Thanks @donaldsharp. I will study your idea and patch in #16917 |
36364d6
to
6b9e141
Compare
@donaldsharp The revised patch is along the line of a) in your recommendation. The metric field already exists when the nexthop is updated to the clients. The current mechanism is working for non-recursive nexthops. We just need to get the correct value for the recursive nexthops. It's good that no other fields in the rnh or logic need to be changed. Thanks for your review and suggestions. |
@donaldsharp @ton31337 @mwinter-osr @riw777 The latest patch has much narrower scope. It only calculates the IGP metric without touching the existing "re" and the nexthop lists in the rnh structure. Currently the IGP metric is already correct for non-recursive nexthops. The patch will make it correct for recursive nexthops as well. Please review. Thanks. |
In addition to the route metric and nexthops, several other parameters (e.g., prefix and protocol type) are also sent to clients. I will update the patch shortly. |
a14ca6f
to
6305c05
Compare
The recursive route is problematic in these cases as well: Both BGP and staticd check for the route type. These checks should be on
|
zebra/rib.h
Outdated
uint8_t resolved_distance; | ||
uint16_t resolved_instance; | ||
uint32_t resolved_metric; | ||
struct prefix resolved_prefix; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from resolved_route
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resolve_route is the one from the first lookup in zebra_rnh_resolve_nexthop_entry(). In case of a recursive nexthop, the resolved_route is different from the fully-resolved route.
Found an issue with the patch. There can still be inconsistencies between the route and the nexthop. |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
784909f
to
a622cb1
Compare
@donaldsharp @ton31337 @mwinter-osr @riw777 I have re-worked the patch. Please review. Thanks. There is one topotest failure in which the resolved nextops (for FIB) are different. I don't understand how the NHT could impact the real nexthop resolution. Can I get some help with that topotest? |
Add a topotest case that a BGP nexthop being tracked in nht is first resolved to another BGP route, and then recursively resolved to an OSPF route. As a result, the correct IGP metric from OSPF is given to BGP for bestpath calculation. Signed-off-by: Enke Chen <[email protected]>
When a nexthop being tracked is resolved to a recursive route at the first level of recursion, several parameters (such as the prefix, metric and protocol type) of the recursive route are sent to clients, along with the fully-resolved nexthop. This creates an inconsistency or de-coupling between the route and its nexthop, that is, the fully-resolved nexthop would belong to a different route that is non-recursive. Thus the update message to clients would contain parameters like prefix, metric and protocl type from a recursive route, and the nexthop from a non-recursive route. In particular, the protocol type and the metric fields in the update message no longer reflect the property of the fully-resolved nexthop in the meesage. It would be diffcult for a client to make sense of these parameters. As an example, when the nexthop being tracked is resolved to a BGP route, instead of further resolving to an IGP route and getting its metric, the MED value of the BGP route would be taken as the metric and given to BGP for bestpath calculation, resulting in incorrect route selection. The fix is to always calculate and return the fully-resolved route to clients so that the route parameters like protocol type, distance, metric and the resolved nexthops are consistent. When a nht nexthop is resolved recursively, associate the nht nexthop with each of the recursive routes so that the nht entry is re-evaluated when there is a change to any of the dependent routes. Signed-off-by: Enke Chen <[email protected]>
a622cb1
to
af49397
Compare
Brief summary: Currently recursive route + fully-resolved nexthop are sent to
NHT client. Proposed fix is to send fully-resolved (route + nexthop) instead.
When a nexthop being tracked is resolved to a recursive route at the
first level of recursion, several parameters (such as the prefix,
metric and protocol type) of the recursive route are sent to clients,
along with the fully-resolved nexthop.
This creates an inconsistency or de-coupling between the route and its
nexthop, that is, the fully-resolved nexthop would belong to a different
route that is non-recursive. Thus the update message to clients would
contain parameters like prefix, metric and protocl type from a recursive
route, and the nexthop from a non-recursive route.
In particular, the protocol type and the metric fields in the update
message no longer reflect the property of the fully-resolved nexthop in
the meesage. It would be diffcult for a client to make sense of these
parameters.
As an example, when the nexthop being tracked is resolved to a BGP route,
instead of further resolving to an IGP route and getting its metric, the
MED value of the BGP route would be taken as the metric and give to BGP
for bestpath calculation.
The fix is to always calculate and return the fully-resolved route to
clients so that the route parameters like protocol type, distance,
metric and the resolved nexthops are consistent.
When a nht nexthop is resolved recursively, associate the nht nexthop
with each of the recursive routes so that the nht entry is re-evaluated
when there is a change to any of the dependent routes.